Structure and molecular evolution of the barcode fragment of cytochrome oxidase I (COI) in Macrocheles (Acari: Mesostigmata: Macrochelidae)

Abstract Consisting of approximately 320 species, Macrocheles is the most widely distributed genus in the family Macrochelidae. Though some studies have focused on the description of Macrochelidae using molecular techniques (e.g., RAPD) and sequencing of some genes, the interspecies relationships within Macrocheles still remain uncertain. As such, in the present study, we examine all publicly available data in GenBank to explore the evolutionary relationships, divergence times, and amino acid variations within Macrocheles. Exploring the patterns of variation in the secondary protein structure shows high levels of conservation in the second and last helices, emphasizing their involvement in the energy metabolism function of the cytochrome oxidase subunit I enzyme. According to our phylogenetic analysis, all available Macrocheles species are clustered in a monophyletic group. However, in the reconstructed trees, we subdivided M. merdarius and M. willowae into two well‐supported intraspecific clades that are driven by geographic separation and host specificity. We also estimate the divergence time of selected species using calibration evidence from available fossils and previous studies. Thus, we estimate that the age of the Parasitiformes is 320.4 (273.3–384.3) Mya (Permian), and the Mesostigmata is 285.1 (270.8–286.4) Mya (Carboniferous), both with likely origins in the Paleozoic era. We also estimate that Macrocheles diverged from other Mesostigmata mites during the Mesozoic, approximately 222.9 Mya.

ity in morphological features, and unknown cryptic diversity can always be uncovered using molecular techniques. Molecular techniques have proven highly effective for exploring the phylogenetic relationships and species boundaries of mites (Dabert et al., 2011;Li et al., 2010;Skoracka et al., 2012;Skoracka & Dabert, 2010;Yang et al., 2011;Zhao et al., 2014Zhao et al., , 2015. However, few studies have used molecular information to explore the phylogenetic relationships of macrochelids. Recent studies on the species of Macrocheles focused mainly on the descriptions of new species (Knee, 2017;Niogret et al., 2007;Niogret & Nicot, 2008). The interspecies relationships within Macrocheles, as well as the time of divergence from its most recent common ancestors, remain uncertain.
Mites have a long evolutionary history, dating back to at least 410 million years ago (Mya) based on their presence in early terrestrial fossil layers Dubinin, 1962;Hirst, 1923;Xue et al., 2017). However, their evolutionary history is only partially understood, and mites remain one of the least studied major branches of the animal tree of life (Dunlop & Alberti, 2008;Krantz & Walter, 2009). Mites are poorly represented in the fossil record compared with other arthropods, which may be due to mites being difficult to preserve, quite small, and easily overlooked in fossils (Sidorchuk, 2018). Of the two Acari superorders, the Acariformes are far better represented in the fossil record than the Parasitiformes (Fuente, 2003). Acariformes first originated c. 435 MYA in the Llandovery Silurian  and have been collected across the Cretaceous (Arillo et al., 2022;Porta et al., 2020;Sidorchuk et al., 2015). The oldest parasitiform fossils are from 90 Mya. Klompen & Grimaldi, 2001).
The poor representation of Parasitiformes in the fossil record could be because parasitiform mites rarely colonize the bark of trees and thus are rarely captured in amber and fossilized Van der Hammen, 1977). The Parasitiformes are likely much older than 90 Mya (Dunlop & Bernardi, 2014), but the poor representation of these mites in the fossil record impedes our understanding of their evolutionary origins. Molecular dating provides a means to uncover the origins of these mites.
In the present study, we aim to: (i) explore the phylogenetic relationships of Macrocheles species using the barcode region of cytochrome oxidase subunit I (COI) from publicly available data on GenBank, (ii) estimate the time of divergence for the genus using a molecular clock and available COI data, and (iii) characterize the amino acid and protein structure of COI in Macrocheles.

| Multiple sequence alignment and phylogenetic analysis
We began by downloading from GenBank all publicly available nucleotide sequences spanning the barcode region of COI, for named Macrocheles species (10 June 2021) ( Table A1). After aligning the nucleotide sequences of these species using MAFFTv.7 (Katoh & Standley, 2013), we visualized and inspected them manually using Mesquite v.3.61 (Maddison & Maddison, 2015). We selected a Parasitidae species (MN353534) as the outgroup because, based on BLAST search results, its sequence was one of the most closely related to Macrochelidae.
Implementing the ModelFinder program of IQ-TREE, we generated a maximum likelihood (ML) based on nucleotide sequences of selected taxa (Martin et al., 2015), creating an ML tree based on the best substitution model. To evaluate branch support for the ML tree, we used 1000 replicates of Ultrafast bootstrap (UFBoot) (Felsenstein, 1985).

replicates.
Finally, we examined the phylogenetic relationships of taxa using SplitsTree v4.16.1 (Huson & Bryant, 2006) by creating a Neighbor-Net network from uncorrected patristic distances with 1000 bootstrap replications using the default setting.

| COI properties and protein structure
To examine the protein structure of COI among the 12 available species, we selected one sequence per species that did not have any ambiguous bases for further analysis. We calculated the percent identity of amino acid sequences of 12 selected species using BLAST's blastp version. To identify how many characters of target and query taxa were identical. We also calculated the pairwise distance between amino acid sequences by MEGAX using the default setting (Kumar et al., 2018).
After aligning the amino acid sequences of selected species using MAFFTv.7 (Katoh & Standley, 2013), we used different tools to examine the structure of the proteins at three levels: primary, secondary, and tertiary. To examine the primary structure of selected proteins, we counted the frequency of amino acids per protein and visualized the generated multiple sequence alignment (MSA) using a web-based sequence manipulation suite program to see the conservation pattern among the taxa (Stothard, 2000). To compare the tertiary structure of selected taxa, we first created a 3D model of Macrocheles willowae, which has the longest length among the 12 selected sequences. We validated the model using ProSA-web (Wiederstein & Sippl, 2007) and a Ramachandran plot (https://zlab.umass med.edu/bu/rama/index.pl). To illustrate the conservation pattern of all sequences in relation to the 3D structure, we mapped the 12 selected amino acid sequences on the surface of the modeled structure using the ConSurf server (Ashkenazy et al., 2016).
Finally, we detected the physicochemical properties of the proteins using the ProtParam program (https://web.expasy.org/cgi-bin/ protp aram).

| Molecular dating
The estimated age of selected Macrocheles species was determined using BEASTv2.6.3. To accurately calibrate the estimated age analysis, however, reliable fossils would be required. Although extant parasitiform mites occur in a wide array of habitats worldwide, they are infrequently collected in fossil records. Acariformes mites make up most fossilized mite collections (Dunlop et al., 2018). As a result, the fossil records for Mesostigmata in our analysis were limited to just Aclerogamasus sp. (Parasitidae) and Myrmozercon sp. (Laelapidae). Therefore, we used the estimated age of Parasitiformes, which is based on the latest Acari dating study of Arribas et al. (2020), as the secondary calibration ( Table 1). This calibration method is commonly used to calibrate the trees in a new study based on an estimated date of a node from previous studies. To calculate the uncertainty of estimates, we used lognormal distribution and normal distribution for the fossil and secondary calibrations, respectively.
For molecular dating analysis, we employed the GTR + G4 substitution model (suggested by the IQ-TREE analysis) and the Yule process model, along with a relaxed lognormal clock as the simplest speciation model. We set the screen log to 10 4 and the chain length to 100 million, and we used the configuration file generated by BEAUTYv2.6.3. To check the congruency among parameters, we evaluated the results by Tracer v1.6 (Bouckaert et al., 2014).

| Phylogenetic analysis
The COI alignment generated based on nucleotide sequence from publicly available GenBank data on 111 Macrocheles specimens and one Parasitidae species produced an alignment with 658 characters in total, with 45% parsimony-informative, 3% singleton, and 52% constant sites. The best fit model based on the Bayesian information criterion was TIM2+F+I+ by the ModelFinder program implemented in IQ-TREE. The congruency of two BI and ML trees that we examined using several tests implemented in IQ-TREE program are summarized in Table 2. While the log-likelihood of the two trees was identical (marked as a plus) based on the calculated bootstrap portion of both RELL (Kishino et al., 1990) and expected likelihood weight (Strimmer & Rambaut, 2002) methods, the estimated p-values of the remaining tests ( Table 2) were less than 0.05 indicating the significant difference between the two trees (marked as a minus).
All 12  We further divided two species, M. merdarius and M. willowae, into two well-supported clades.

TA B L E 2
The result of different topological congruency tests performed by IQ-TREE program a .
a logL is log of likelihood; deltaL is the difference of calculated logL between two trees; bp-RELL is bootstrap proportion using RELL method (Kishino et al., 1990); p-KH is the estimates p-value of the one-sided Kishino-Hasegawa test (Kishino & Hasegawa, 1989); p-SH is the p-value of the Shimodaira-Hasegawa test (Shimodaira & Hasegawa, 1999); p-WKH is the p-value of the weighted Kishino-Hasegawa test; p-WSH is the p-value of the weighted Shimodaira-Hasegawa test; c-ELW is expected likelihood weight (Strimmer & Rambaut, 2002), and p-AU is the p-value of approximately the unbiased (AU) test (Shimodaira, 2002); Plus signs indicate 95% confidence test. Minus sign denotes significant exclusion.
F I G U R E 1 (a) BI phylogenetic tree created by MrBayes and (b) the reconstructed ML tree generated by IQ-TREE. The phylogenetic analysis was based on nucleotide sequences including 658 characters of COI marker of 111 Macrocheles mites representing 12 species rooted by a Parasitidae species. The estimated posterior probability of less than 1 and bootstrap support value of less than 100 of the main nodes are shown for each node in a and b, respectively. The trees have been illustrated using Figtree, and each species is denoted by a unique color. The clade with different clustering patterns between two trees has been shown by a dashed box.

(a) (b)
The reticulated network analysis demonstrated that the clustering pattern of the 12 examined macrochelid species was generally congruent with the ML and BI trees.
Most of the morphologically defined species were well delineated by the network analysis, while the relatively large intraspecific divergences among M. merdarius and M. willowae were also wellsupported ( Figure 2).

| COI protein structure and properties
The We have summarized several physicochemical properties of selected proteins in Table 3. The highest value per each property is indicated as a bold number.
The estimated instability index of all proteins varied between 32.10 (for M. merdarius) to 39.65 (for M. praedafimetorum). The instability index of all selected proteins is less than 40, indicating that thy are stable proteins in vitro (Ignatova et al., 2007). By contrast, the grand average of hydropathicity of the selected 12 proteins (sum of the hydropathicity = value of each amino acid / total number of amino acids) ranged from 0.731 to 0.931. This is related to M. penicilliger and M. praedafimetorum, which can be considered the most and least hydrophobic proteins, respectively.
The frequencies of amino acids in COI for the 12 Macrocheles species were generally quite similar (Figure 4).
The secondary structure of the COI region contains five helices (H1-H5) connected by five loops. Of the 230 amino acid characters we examined, 67% were conserved across all 12 Macrocheles species, 36% of which were located in helix regions (Figure 5a; Tables A2 and A3). The predicted positions of each helix were quite similar across species, with only minor variability (Figure 5c). Across the 12 species, 50, 91, 35, 78, and 87% of amino acid sequences located in H1, H2, H3, H4, and H5 regions, respectively, were conserved. Accordingly, H2 is the most conserved helix, followed by H5. The hydropathicity profile in all species estimated by the ProtScale program ( Figure 5c) was quite similar. As is clear from the graphs in Figure 5c, each of the five helices (H1-H5) has a distinctly high hydrophobicity score.
After analyzing the primary and secondary structures of the studied proteins, we mapped all amino acid sequences on the predicted model of the COI marker, M. willowae (AUC63267) ( Figure A1). The modeled 3D structure confirmed a high F I G U R E 3 Heatmap based on a matrix including the (a) identity (%) between the amino acid sequences of 12 Macrocheles species, calculated using blastp, and (b) pairwise distance between amino acid sequence of 12 species calculated by MEGA X. The color intensity indicates the level of identity and distance between the species. The white color indicates the lowest number while dark green, and dark red color represents the highest number in two created heatmaps.

TA B L E 3 Physicochemical properties of the barcode fragment of COI from
Macrocheles based on amino acid sequences of 12 machrochelid species.

F I G U R E 4
The frequencies of amino acid in the barcode region of COI for 12 Macrocheles species. The amino acids have been clustered based on biochemical properties into five groups named, nonpolar, positive, aromatic, polar-unchanged, and negative, which is shown below the generated graph.  (Tables A2 and A3). (c) the hydropathy profiles estimated by ProtScale program. The y-axis is related to the hydrophobicity score (higher score, more hydrophobicity), and x-axis shows the position of amino acids per taxa. conservation of COI amino acid sequences in the H2 and H5 regions ( Figure 6).

| Macrocheles divergence time
The time-calibrated phylogeny tree created from our BEAST analysis is displayed in Figure 7 and the divergence time of the Macrocheles species, and other taxa used in this study are summarized in Table 4.
Considering this, we estimated that the age of the Parasitiformes

| DISCUSS ION
In this study, we explored the phylogenetics of Macrocheles species using COI and characterized the amino acid and protein structure of COI for the genus using publicly available sequence data. While a few studies have focused on the molecular characterization of COI in some arthropods, there is no existing information about the COI structure of Macrocheles.
To gain a better understanding about the variability of COI, we compared the primary, secondary, and tertiary protein structures in Macrocheles in other groups of arthropods as well (Pentinsaari et al., 2016;Sabir et al., 2019).
In the secondary and tertiary structure of the COI barcode of Macrocheles, helices are more variable than loops, which might be because of the role of loops in stability of the proteins (Ulmschneider et al., 2005). However, of five detected helices, the second helix and the last helix were the most conserved. This may be due to their important function of penetrating the mitochondrial membrane, which changed the structure of the enzyme and likely reduced the energy metabolism of the cells (Mathews et al., 2013). Arthropods are believed to have originated during the Cambrian of the early Paleozoic (Arribas et al., 2020), and some arachnid groups (e.g., spiders, scorpions, mites) diversified more heavily during the late Paleozoic (Jeyaprakash & Hoy, 2009 (Sellwood & Valdes, 2006). This warming spurred the diversification of the angiosperms, which in turn resulted in an explosion of insect and mite species (Labandeira et al., 2007: Schuettpelz & Pryer, 2009).

ACK N OWLED G M ENTS
This study was financially supported by Vali-e-Asr University of Rafsanjan Department of Plant Protection.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that supports the findings of this study are available in the supplementary material of this article

R E FE R E N C E S
F I G U R E A 1 (a) Three-dimensional structure orientation of COI marker of M. willowae (AUC63267) predicted by SWISS MODEL online tools (http://swiss model.expasy.org/) based on an available COX protein (7JRO.1) (Maldonado et al., 2021) in public protein data bank (PDB) as model. 3D structure evaluation of the model has been displayed by (b) Ramachandran plot and (c and d) ProsA-web tool. In graph c, the x-axis is related to z-score (indicating the quality of 3D structure) and y-axis indicates the number of available 3D structure, which are determined by different methods; the light blue and dark blue dots shows the z-score of the proteins, which are the result of X-ray or NMR determination methods. The z-score of our model have been shown by a black dot in graph (c). The dark and light green lines in graph (d) display estimated average energy over each 40-and 10-residue fragment, respectively.